Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
基于变压器的模型已经证明了它们在自动语音识别(ASR)任务中的有效性,甚至比常规混合框架表现出卓越的性能。变形金刚的主要思想是通过自我发挥层来捕捉话语中的远程全球背景。但是,对于诸如对话演讲之类的场景,这种话语级建模将忽略跨越话语的上下文依赖性。在本文中,我们建议在基于变压器的端到端体系结构中明确模拟索语中的索引信息,以进行对话性语音识别。具体而言,对于编码器网络,我们捕获了先前语音的上下文,并将此类历史信息纳入了通过上下文感知的残余注意机制中的当前输入。对于解码器而言,当前话语的预测还可以通过有条件的解码器框架在历史性的语言信息上进行条件。我们展示了我们提出的方法在几个开源对话中心的有效性,而拟议的方法始终提高了基于话语级变压器的ASR模型的性能。
translated by 谷歌翻译
本文解决了机器人的问题,可以协作将电缆带到指定的目标位置,同时避免实时碰撞。引入电缆(与刚性链接相反)使机器人团队能够通过电缆的松弛/拉特开关更改其内在尺寸,从而使机器人团队能够穿越狭窄的空间。但是,这是一个具有挑战性的问题,因为混合模式开关以及多个机器人和负载之间的动态耦合。以前解决此类问题的尝试是离线执行的,并且不考虑避免在线障碍。在本文中,我们介绍了一个级联的计划方案,并采用平行的集中式轨迹优化,涉及混合模式开关。我们还每个机器人开发了一组分散的计划者,这使我们可以解决在线协作负载操作问题的方法。我们开发并演示了第一个能够移动有线电视载荷的首个协作自治框架之一,该框架太重了,无法通过一个机器人移动,通过狭窄空间,具有实时反馈和实验中的反应性计划。
translated by 谷歌翻译
变压器已被广泛应用于文本分类。不幸的是,现实世界中的数据包含异常和嘈杂的标签,这些标签对最先进的变压器造成了挑战。本文提出了Protoformer,这是一种针对变压器的新型自学习框架,可以利用有问题的样本进行文本分类。原型类型具有嵌入样品的选择机制,使我们能够有效提取和利用异常原型和困难的类原型。我们在具有不同文本结构的数据集上演示了此类功能(例如Twitter,IMDB,Arxiv)。我们还将该框架应用于多个模型。结果表明,原构物可以改善各种经验环境中的电流变压器。
translated by 谷歌翻译
自我监督的学习方法,如对比学习,在自然语言处理中非常重视。它使用对培训数据增强对具有良好表示能力的编码器构建分类任务。然而,在对比学习的学习成对的构建在NLP任务中更难。以前的作品生成单词级更改以形成对,但小变换可能会导致句子含义的显着变化作为自然语言的离散和稀疏性质。在本文中,对对抗的训练在NLP的嵌入空间中产生了挑战性和更难的学习对抗性示例作为学习对。使用对比学学习提高了对抗性培训的泛化能力,因为对比损失可以使样品分布均匀。同时,对抗性培训也提高了对比学习的稳健性。提出了两种小说框架,监督对比对抗学习(SCAS)和无监督的SCAS(USCAL),通过利用对比学习的对抗性培训来产生学习成对。利用基于标签的监督任务丢失,以产生对抗性示例,而无监督的任务会带来对比损失。为了验证所提出的框架的有效性,我们将其雇用到基于变换器的模型,用于自然语言理解,句子语义文本相似性和对抗学习任务。胶水基准任务的实验结果表明,我们的微调监督方法优于BERT $ _ {基础} $超过1.75 \%。我们还评估我们对语义文本相似性(STS)任务的无监督方法,并且我们的方法获得77.29 \%with bert $ _ {base} $。我们方法的稳健性在NLI任务的多个对抗性数据集下进行最先进的结果。
translated by 谷歌翻译
我们提出了用于将Swin变压器缩放到3亿参数的技术,并使其能够使用高达1,536美元的图像培训1,536美元。通过缩放容量和分辨率,Swin变压器在四个代表视觉基准上设置新记录:84.0%的Top-1在Imagenet-V2图像分类准确度,63.1 / 54.4盒/掩模地图上的Coco对象检测,59.9 Miou在Ade20K语义细分中,在动力学-400视频动作分类上的86.8%的前1个精度。我们的技术通常适用于缩放视觉模型,这尚未广泛探索为NLP语言模型,部分原因是培训和应用中的困难:1)视觉模型经常面临规模的不稳定问题,2)许多下游愿景任务需要高分辨率图像或窗口,并且目前尚不清楚如何有效地将模型在低分辨率上预先培训到更高分辨率。当图像分辨率高时,GPU存储器消耗也是一个问题。为了解决这些问题,我们提出了几种技术,通过使用Swin Transformer作为案例研究来说明:1)归一化技术和缩放的余弦注意力,提高大视觉模型的稳定性; 2)一种日志间隔的连续位置偏置技术,以有效地将在低分辨率图像和窗口预先训练的模型转移到其更高分辨率的对应物。此外,我们分享了我们的关键实施细节,导致GPU内存消耗的大量节省,从而使得用常规GPU培训大型视觉模型可行。使用这些技术和自我监督的预训练,我们成功培训了强大的3B往返变压器模型,并有效地将其转移到涉及高分辨率图像或窗口的各种视觉任务,实现了各种最先进的准确性基准。
translated by 谷歌翻译
在预先建立的3D环境图中,高精度摄像头重新定位技术是许多任务的基础,例如增强现实,机器人技术和自动驾驶。近几十年来,基于点的视觉重新定位方法已经发达了,但在某些不足的情况下不足。在本文中,我们设计了一条完整的管道,用于使用点和线的相机姿势完善,其中包含创新设计的生产线提取CNN,名为VLSE,线匹配和姿势优化方法。我们采用新颖的线表示,并根据堆叠的沙漏网络自定义混合卷积块,以检测图像上的准确稳定的线路功能。然后,我们采用基于几何的策略,使用表极约束和再投影过滤获得精确的2D-3D线对应关系。构建了以下点线关节成本函数,以通过基于纯点的本地化的初始粗姿势优化相机姿势。在开放数据集(即线框上的线提取器)上进行了足够的实验,在INLOC DUC1和DUC2上的定位性能,以确认我们的点线关节姿势优化方法的有效性。
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.
translated by 谷歌翻译